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Abstract 

Systolic blood pressure and diastolic blood pressure are known risl< factors for cardiovascular diseases and 
understanding their genetic basis will have important public health implications. For rare variants, it is extremely 
challenging to make statistical inference for single-maker tests. Therefore, joint analysis of a set of variants has been 
proposed. In this paper, we applied recently proposed methods "test for testing the effect of an optimally 
weighted combination of variants" and "variable weight-TOW" to determine genetic regions that are associated 
with blood pressure. Then least absolute shrinkage and selection operator, as well as sparse partial least square 
methods, were used to identify significant markers within a gene or in intergenic regions. We investigated the 
effect of rare variants and common variants, and their combined effect. 



Background 

It is well known that high blood pressure is an impor- 
tant risk factor for cardiovascular diseases. Elevated 
blood pressure is a complicated trait that affects more 
than 30% of the adult population [1,2]. An increase in 
systolic and diastolic blood pressure has a continuous 
impact on the risk of cardiovascular diseases. Globally, 
every year, high blood pressure contributes to approxi- 
mately 13.5% of premature deaths, 54% of stroke, and 
47% of ischemic heart disease [1,3]. Genetic heritance is 
one of the major risk factors for hypertension. For com- 
plex diseases, the common disease-common variant (CD- 
CV) hypothesis that underpins genome-wide association 
studies (GWAS) has led to the identification of several 
novel susceptibility loci. However, a majority of the herit- 
ability is unexplained. It has been pointed out that the 
GWAS-identified variants can only explain a small por- 
tion of the heritability; therefore, exploration is still 
needed to unveil the undiscovered variants [12]. Recentiy, 
arguments have been put forward against CD-CV, and 
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common disease-rare variants (CD-RV) as an alternative 
has been proposed. It is based on the assumption that 
the etiology for common diseases is caused by the cumu- 
lative effect of multiple rare variants [4,5]. Nevertheless, 
another merging hypothesis states that common diseases 
are caused by the combination of common and rare 
variants [6-8]. 

In this paper, we focused on identifying whether a gene 
is associated with blood pressure. We applied recently 
proposed tests called "test for testing the effect of an 
optimally weighted combination of variants (TOW)" and 
"variable weight-TOW (VW-TOW)" [9] to determine 
significant genetic regions. Our interest also lies on iden- 
tifying the associated variants for regions that are found 
significantly associated by applying sparse methods Lasso 
and SPLS [10,11]. 

Methods 

Data 

Both the real and simulated data that were made available 
for Genetic Analysis Workshop 18 (GAW18) were used. 
We focused on the genotype data on chromosome 3 for 
unrelated individuals. The baseline data for the covariates 
and the phenotypes were considered. We considered the 
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first time point of systolic blood pressure (SBP) and dia- 
stolic blood pressure (DBP) as the traits. We also used a 
composite of the 2 phenotypes called the mean arterial 
pressure, which is defined as (2/3)*DBP + (1/3)*SBP. For 
the genotype data, we mapped single-nucleotide poly- 
morphisms (SNPs) to the genes; the remaining SNPs that 
do not belong to any genes, were grouped as intergenic 
regions. A total of 2286 regions (consisting of 1224 genes 
and 1062 intergenic regions) that include all the SNPs 
were defined. The regions were further divided into "rare" 
or "common" based on minor allele frequency (MAF) 
threshold of 0.01. 

Association tests 

TOW and VW-TOW are recently proposed methods 
that allow covariates and account for direction effects for 
causal variants. Let Z, = . . . , ZipY, X; = {xn, . . . , Xj^)^ 
and Yi be the covariates, genotype (coded 0, 1, 2) and phe- 
notype for the individual, where p and M denote num- 
ber of covariates and variants, respectively. The effects of 
the covariates on Yi and Xim are adjusted by the residuals 
of the following linear models 



yi =ao+ oiiZii + ...+ apZip + Si 
and 

— OfOm + Oflm^il + • • • + Olpj^Zip + Tj>n. 



(1) 



(2) 



The methods are based on the optimal weighting 
scheme, which is defined as = — ^—^ — ; ; — 2 — > 

where and Xim denote the residuals from equations (1) 

and (2) for the i"^ individual respectively. Let 

X? = w'Lxim- The test statistics for TOW is defined 

as Tmw = EliiVi - - x"). For VW-TOW, let T, 
and Tc denote the test statistics of TOW for rare and com- 
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+ (1-A)- 
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mon variants, Ti - k- — ^ ^ — ^ 

be the p value of Ti. The test statistics for VW-TOW is 
defined as Tvw-mw = mino<A<i px = niino<k<K Pa^, 
where Xk = k/K for k = 0, 1, • • • , K. The p values are evalu- 
ated by permutation. 

After identifying the significant genomic regions, we 
further investigated the SNPs that have important con- 
tribution to the phenotypes for the significant regions 
by variable selection methods Lasso and SPLS, which 
are available in the R package: "RV tests." Because this 
package does not allow covariates, we adjusted the effect 
of environmental factors using the linear model shown 
in equation (1). Instead of the observed trait, the resi- 
duals from the linear model are treated as the 
phenotype. 



A summary of the steps we followed for real data 

Step 1: Map the SNPs to gene and intergenic regions 
based on the annotation file refGene.txt. gz (available 
from http://hgdownload.cse.ucsc.edu/). Then the genes 
or intergenic regions were further divided into subre- 
gions ("rare" vs. "common") based on a threshold of 
MAF = 0.01. 

Step 2: Extract the genotype, phenotype (baseline mea- 
sures) and covariates (baseline measures) data for the 
unrelated individuals. Remove the participants that have 
missing variables in phenotype or covariates data. 

Step 3: TOW and VW-TOW are applied to identify 
the regions that are associated with the traits. 

Step 4: Apply Lasso and SPLS to the regions to discri- 
minate the associated variants from noise (using the R 
package "RV tests"). 

Results 
Real data 

The sample used in our analysis is made up of 142 inde- 
pendent individuals. After removing missing variables, 
129 subjects were analyzed. There are, in total, 
1,215,296 markers on chromosome 3; approximately 
one-sixth of the markers were removed as a result of 
zero variation across the 129 independent samples. 

The association tests (TOW and VW-TOW) were 
applied to each genetic region for SBP, DBP, and mean 
arterial pressure (MAP) on chromosome 3. Both tests 
produce an empirical p value, based on 10,000 permuta- 
tions for each region. Figure 1 displays the p value plot 
for DBP, where the x-axis denotes the position of the 
genes in original order on chromosome 3. The p values 
for intergenic regions are not included in Figure 1. By 
parallel comparison, we can see that effects of the genes 
are caused by the rare variants or the common variants. 
We note that there is a small cluster of genes that 
appear highly significant around the 440th region in the 
upper and lower plots. 

After obtaining all the p values, regions that have 
strong association with the traits are picked according 
to the ranking of the p values. We decided to set the 
significant level threshold to be 0.001, so as to be more 
selective. Genes are only selected if they satisfy this cri- 
terion for the trait using both TOW and VW-TOW. 
Table 1 lists the regions that appear to be potentially 
important. For SBP, there are 3 genes; 2 genes with 
common variants only and 1 gene with rare variants 
only are highly associated with the trait. For MAP there 
are 4 genes when a combined analysis of "rare" and 
"common" variants is done, and 2 genes are significant 
with common variants only. For DBP, the number of 
significant regions is greater than the other 2 traits. For 
this trait, not only variants that belong to genes, but 
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Figure 1 p value plot for DBP (rare variants); p value plot for DBP (common variants); and p value plot for DBP (combined). 



also variants in intergenic regions exhibit strong 
association. 

As mentioned earlier, there is a cluster of regions 
(shown in Figure 1) that show strong significance for 
DBP. The region names are: TWF2, PPMIM, region 
between PPMIM and WDR82, WDR82, region between 
WDR82 and GLYC7K, GLYC7K, region between 
GLYC7K and DNAHl, BAPl, PHF7, SEMA3G, and 
TNNCl. The above regions all fall inside the physical 
location range of (52262625, 52488057). 

Then variable selection methods Lasso and SPLS are 
applied to the regions that are picked at the gene (or 
region) level. Table 1 also summarizes the numbers of 
significant markers that were selected using these sparse 
methods. The number of selected markers can be varied 
with different choice of penalty parameter. 

Simulated data 

In stage I, we focused on the top significant genes on 
chromosome 3, which are MAP4, FLNB, and ABTBl, with 
common and rare variants combined. We analyzed all 200 
replicates with the target genes to assess the power of 
TOW and VW-TOW. MAP4 has large effect on both SBP 
and DBP, whereas FLNB and ABTBl have small effects on 
SBP only. We adjusted the phenotypes by all the covariates 



at baseline. Table 2 reports the results. We can see that 
both methods have very poor power when the variants are 
all rare in the genes. Table 2 does show, however, that 
TOW has better performance than VW-TOW in most 
cases. MAP shows better power than the other 2 pheno- 
types. In the cases of small effect size, the power is very 
low for both TOW and VW-TOW. 

In stage II, we assessed the performance of Lasso and 
SPLS by analyzing all 200 replicates on MAP4 with all 
the variants. There are 6 target SNPs in MAP4, but 1 of 
the SNPs is removed because of monomorphism. The 
location numbers of the 5 SNPs are 48040283, 47957996, 
47956424, 48040284, and 47913455. Both Lasso and 
SPLS are variable selection methods. With the careful 
selection of the penalty parameters for both methods, on 
average approximately 5 variants are selected with every 
replicate. Table 3 shows the results. We can see that 
using MAP as phenotype demonstrates higher power 
than using SBP or DBP. Lasso and SPLS have very poor 
power to detect 47956424 and 47913455. 

Discussion 

Most recently proposed methods assign large weights to 
rare variants and small weights to common variants, 
resulting in low power. On the other hand, TOW and 
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Table 1 Common, rare and total number of variants Identified by LASSO, SPLS and by both methods. 
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A, number of total variants; B, number of variants selected by LASSO; C, number of variants selected by SPLS; D, number of variants selected by both LASSO and SPLS; 
lnter-376, region between PPMIM & WDR82; lnter-377, region between WDR82 & GLYCTK; lnter-378, region between GLYCTK & DNAHl; lnter-530, region between 



NXPE3 & LOC152225; lnter-896, region between RPL22L1 & EIF5A2; lnter-898, region between SLC2A2 &TNIK; lnter-982, region between ST6GAL1 & RPL39L 



VW-TOW assign corresponding weight, which can 
account for the direction effect, to individual variants. 
The methods outperform some currently popular meth- 
ods, such as Combined Multivariate and Collapsing 
(CMC) and sequence kernel association test (SKAT), in 



various scenarios [9]. In addition, both TOW and VW- 
TOW can be modified to account for population strati- 
fication using principal component approach. 

Overall, we were able to detect some significant genes 
based on association tests (TOW and VW-TOW) with 



Table 2 Power of TOW and VW-TOW to detect MAP4, FLNB, and ABTB1 
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Table 3 Power of Lasso and SPLS to select significant variants 
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SBP, DBP, and MAP. Although we used Lasso and SPLS 
only as variant selection methods, they can also be used 
to do the association test for genotype with complex 
traits. However, both Lasso and SPLS are very computa- 
tionally intensive. In addition, our analysis is focused on 
the independent subjects only, which limits our sample 
size. For future study, it is essential to incorporate 
family structure that not only increases the size of the 
sample available for analysis, but also the number of 
variants. Because SBP and DBP are correlated, it is defi- 
cient to analyze them separately. MAP, which is a com- 
bination of SBP and DBP, has better power than SBP 
and DBP separately. 
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